Identifying lung-specific genes for each cell type¶

Procedures¶

  1. Perform QCs separately for lung and spleen datasets
  2. Integrate two datasets for joint downstream analyses
    • Identify anchors that link cells from either dataset
  3. Perform dimensional reduction and clustering
  4. Annotate clusters with CellTypist
  5. DE analysis across cell types
  6. DE analysis across tissues
  7. GO analysis

1 Performing QCs separately for lungs and spleens¶

Aggregated lung samples¶

Loading required package: Signac

Aggregated spleen samples¶

2 Integraion of lung and spleen datasets¶

The joint analysis of multiple single-cell datasets is achieved by identifying cross-dataset pairs of cells that have shared cell populations, named as anchors. This helps address two problems:

  1. correct for technical differences between datasets
  2. perform comparative analaysis across experimental conditions
In [ ]:
#Data normalization and variable feature selection
lung.list <- SplitObject(lung, split.by = "orig.ident")
lung.list <- lapply(X = lung.list, FUN = function(x) {
    x <- NormalizeData(x, verbose = FALSE)
    x <- FindVariableFeatures(x, verbose = FALSE)
})

#select integration features
features <- SelectIntegrationFeatures(object.list = lung.list)
lung.list <- lapply(X = lung.list, FUN = function(x) {
    x <- ScaleData(x, features = features, verbose = FALSE)
    x <- RunPCA(x, features = features, verbose = FALSE)
})

#iteratively find anchors that link cells across two datasets
anchors <- FindIntegrationAnchors(object.list = lung.list, reference = c(1, 2), reduction = "rpca",
    dims = 1:20)#1:50
lung.integrated <- IntegrateData(anchorset = anchors, dims = 1:20)

3 Dimensional reduction and clustering on all cells¶

4 Annotate cell types with CellTypist¶

Cell type compositions¶

Major cell types

T cell subsets

In [32]:
immune.combined
An object of class Seurat 
321136 features across 53647 samples within 4 assays 
Active assay: peaks (133665 features, 133665 variable features)
 3 other assays present: RNA, ATAC, integrated
 3 dimensional reductions calculated: pca, umap, lsi

6 DE analysis across tissues¶

6 DE analysis across tissues¶

Differential expression test was performed on normalized values stored in immune.combine[["RNA"]]@data. Here are the normalization steps:

  1. Normalize by the total expression for each cell
  2. multiply by a scale factor 10,000
  3. log-transform the result

DE test on merged T cells¶

Volcano plots and QQ plots comparing against null

In [10]:
p1 + p2
In [ ]:
eg = bitr(rownames(sig.genes), fromType = "SYMBOL", toType="ENTREZID", OrgDb = "org.Hs.eg.db")
    all.eg = bitr(rownames(DEG.test), fromType = "SYMBOL", toType="ENTREZID", OrgDb = "org.Hs.eg.db")

    go_list <- enrichGO(gene          = eg$ENTREZID,
                    universe      = names(all.eg$ENTREZID),
                    OrgDb         = org.Hs.eg.db,
                    ont           = "BP",
                    pAdjustMethod = "BH",
                    pvalueCutoff  = 0.01,
                    qvalueCutoff  = 0.05,
                    readable      = TRUE)
In [8]:
library(forcats)
library(enrichplot)
go_list<-mutate(go_list, richFactor = Count / as.numeric(sub("/\\d+", "", BgRatio)))

ggplot(go_list, showCategory = 20, 
                        aes(richFactor, fct_reorder(Description, richFactor))) + 
                            geom_segment(aes(xend=0, yend = Description)) +
                            geom_point(aes(color=qvalue, size = Count)) +
                            scale_color_viridis_c(guide=guide_colorbar(reverse=TRUE)) +
                            scale_size_continuous(range=c(2, 10)) +
                            theme_minimal() + theme(axis.text.y=element_text(size=12)) + 
                            xlab("rich factor") +
                            ylab(NULL)

Visualizing results via volcano plots¶

Color scheme: Red: negative log2FC
blue: positive log2FC
grey: data points not passing bonferroni corrected cutoff 0.05

[[1]]
NULL
[[1]]
NULL

Summerizing number of DEGs by direction¶

Top DE genes labeled on Volcano plots¶

Warning message:
“ggrepel: 167 unlabeled data points (too many overlaps). Consider increasing max.overlaps”
Warning message:
“ggrepel: 94 unlabeled data points (too many overlaps). Consider increasing max.overlaps”
Warning message:
“ggrepel: 88 unlabeled data points (too many overlaps). Consider increasing max.overlaps”
Warning message:
“ggrepel: 237 unlabeled data points (too many overlaps). Consider increasing max.overlaps”
Warning message:
“ggrepel: 221 unlabeled data points (too many overlaps). Consider increasing max.overlaps”
Warning message:
“ggrepel: 295 unlabeled data points (too many overlaps). Consider increasing max.overlaps”
Warning message:
“ggrepel: 63 unlabeled data points (too many overlaps). Consider increasing max.overlaps”
Warning message:
“ggrepel: 39 unlabeled data points (too many overlaps). Consider increasing max.overlaps”
[[1]]
NULL

7 Gene set enrichment analysis via ClusterProfiler¶

Identify the known pathways that are enriched with the differentially expressed genes in lungs

Lung-specific genes in regulatory T cells¶

Summerize top 20 GO terms in molecular functions¶

  • rich factor as ratio between #DE genes in GO terms and #non-DE genes in the background
  • The brighter the color of dots shows, the test becomes more significant
[[1]]
NULL
[[1]]
NULL
[[1]]
NULL